Skip to content

feat(packaging): one-command upgrade — auto-migrate the DB safely on dnf/apt update#569

Merged
remyluslosius merged 5 commits into
mainfrom
feat/auto-upgrade-migrate
Jun 16, 2026
Merged

feat(packaging): one-command upgrade — auto-migrate the DB safely on dnf/apt update#569
remyluslosius merged 5 commits into
mainfrom
feat/auto-upgrade-migrate

Conversation

@remyluslosius

Copy link
Copy Markdown
Contributor

Goal

sudo dnf update -y openwatch* (or apt) should be all an operator does — the package handles the app + DB-schema upgrade automatically, with a backup, without breaking the app or losing data. This implements exactly that for the single-instance appliance model.

What happens on upgrade (automatic)

The RPM %post (when $1 -ge 2) / DEB postinst (when the old-version $2 is set) — upgrade only, never a fresh install — runs openwatch-upgrade.sh:

  1. Skip + warn if the DB is unreachable (don't fail the package).
  2. Stop the service (never run a binary against a mismatched schema).
  3. openwatch migrate --backup-dirpg_dump a restore point, then apply. Each migration is transactional (atomic rollback on failure).
  4. Success → start the new version. Failure → leave the service STOPPED, print the restore path, exit non-zero so dnf/apt flag it. Data stays intact (failed migration rolled back).

Safety properties

  • Password never in argvinternal/dbbackup passes connection params to pg_dump via PG* env only (unit-pinned, AC-01).
  • Fail-closed backup — if the backup fails, it does not migrate (AC-03).
  • Fail-safe service state — a failed migration never leaves a running binary on a half-migrated schema (AC-05). It can't be half-migrated anyway (transactional).
  • Never lose the last restore point — the cleanup timer always keeps the most recent dump (AC-06).

Operator surface

  • openwatch migrate --status — see pending migrations before upgrading.
  • /etc/openwatch/upgrade.confAUTO_BACKUP, BACKUP_DIR, BACKUP_RETENTION_DAYS (config-noreplace).
  • openwatch-backup-cleanup.timer (daily prune), enabled on install/upgrade.
  • docs/runbooks/UPGRADING.md — the one-command flow, recovery, restore-from-backup.

Deliberately out of scope (documented)

  • PostgreSQL ENGINE major-version upgrades (e.g. 15→16) — operator-supervised pg_upgrade, never silently from a scriptlet (it can lose the whole DB). Minor/patch Postgres comes via dnf/apt dependencies.
  • Multi-instance / zero-downtime — the appliance model accepts brief downtime during the migrate.

Verification

  • Built both packages; their scriptlets render the upgrade-only guard; payloads ship the helper, cleanup script, two systemd units, the empty /var/lib/openwatch/backups dir, and upgrade.conf (config).
  • End-to-end against a live DB: migrate --status reports correctly; migrate --backup-dir produced a valid 82 KB pg_dump then migrated; DSN redacted in logs.
  • gofmt/go vet/go build clean; specter check 107 specs; dbbackup + release-upgrade AC-01..07 tests pass.

Spec: new release-upgrade (C-01..05 / AC-01..07).

`dnf update openwatch*` (or apt) now applies pending DB migrations
automatically, with a restore point and a fail-safe service state — for
the single-instance appliance model. The operator runs one command.

Mechanism:
- internal/dbbackup: builds a pg_dump command that takes connection
  params (esp. the password) via PG* env, NEVER argv (no ps leak).
- `openwatch migrate` gains --status (report pending without applying)
  and --backup-dir (pg_dump restore point BEFORE Apply; skipped on a
  fresh DB; FAILS CLOSED — never migrates if the backup fails).
- packaging/common/openwatch-upgrade.sh (RPM %post when $1>=2 / DEB
  postinst when old-version $2 is set — upgrade only, never fresh
  install): stop -> openwatch migrate --backup-dir -> start on success;
  on failure leave the service STOPPED + print the restore path + exit
  non-zero so the package manager surfaces it. Migrations are
  transactional, so a failure rolls back atomically (data intact).
- Backup retention: openwatch-backup-cleanup.timer (daily) prunes dumps
  past BACKUP_RETENTION_DAYS but always keeps the most recent.
  Operator-tunable via /etc/openwatch/upgrade.conf (AUTO_BACKUP, etc.).

Deliberately OUT of scope (documented): PostgreSQL ENGINE major-version
upgrades (operator-supervised pg_upgrade, never from a scriptlet), and
multi-instance/zero-downtime upgrades.

Spec release-upgrade (C-01..05 / AC-01..07); docs/runbooks/UPGRADING.md.
Verified end to end: migrate --status + --backup-dir produce a real
pg_dump then migrate against a live DB; both packages ship + render the
upgrade-only guard.
@github-actions github-actions Bot added documentation Improvements or additions to documentation size/XL labels Jun 16, 2026
Proves the real upgrade path end to end, beyond the source-inspection +
scriptlet-logic tests: install the OLD openwatch RPM (release 1), stand up
Postgres, roll the schema back one migration to simulate the prior
version, then `rpm -U` the NEW RPM (release 2) and assert the package's
%post scriptlet migrated the DB to head (34 -> 35, host_connection_profile
created), took a pre-upgrade backup, and issued the service stop/start.

- packaging/tests/upgrade-container-test.sh runs inside rockylinux:9 (a
  systemctl shim records stop/start since the container has no systemd).
- packaging/tests/run-upgrade-container-test.sh is the one-command host
  driver: builds the two RPM releases and runs the container.
- openwatch-upgrade.sh: OPENWATCH_UPGRADE_CONF / OPENWATCH_SECRETS_ENV
  overrides (default to the production /etc paths) so the test can point
  config + secrets at a scratch fixture. No production behavior change.

Verified locally: RESULT PASS (34 -> 35, backup taken, stop+start issued).
The two scripts were silently caught by .gitignore's blanket `*test*.sh`
and dropped from the previous commit. Add a `!packaging/tests/*test*.sh`
exception so committed test-harness scripts are tracked, and add the
container upgrade test + its host driver.
Adds an `upgrade` job to package-smoke that exercises the full package
UPGRADE path the per-distro `smoke` (fresh install) job can't: build an
old (release 1) + new (release 2) RPM, then in a rockylinux:9 container
install the old, stand up Postgres, roll the schema back one migration,
and `rpm -U` the new — asserting the %post scriptlet migrates the DB to
head, takes a pre-upgrade backup, and stop/starts the service. Runs the
committed packaging/tests/run-upgrade-container-test.sh driver. Fires on
the same packaging-change / tag / dispatch triggers as the rest of the
workflow, so it runs on this PR.
The command is the constant pg_dump, args are package-controlled flags +
an output path (never user input), and connection params travel via env
not argv. Annotate both exec sites with // #nosec G204 (the repo's
convention) so make lint passes.
@remyluslosius remyluslosius merged commit ca9e440 into main Jun 16, 2026
21 checks passed
@remyluslosius remyluslosius deleted the feat/auto-upgrade-migrate branch June 16, 2026 04:23
remyluslosius added a commit that referenced this pull request Jun 16, 2026
- Un-ignore SESSION_LOG.md (.gitignore listed it next to the
  already-tracked BACKLOG.md; both are the session-continuity docs
  CLAUDE.md/BACKLOG reference for provenance) and add it with the
  2026-06-16 handoff: SSH full-matrix + per-host learning (#566),
  packaging fresh-install + auto-upgrade (#564/#569), CI gate speedup
  (#567), settings/cleanup (#561/#562/#563/#568), and the Dependabot
  triage (9 merged / 6 skipped), plus next-steps + gotchas.
- BACKLOG: drop the completed PKG-1/PKG-2 (shipped in #564); add the SSH
  learning follow-up (wire connprofile into discovery/intelligence/
  liveness) and a "Deferred Dependency Upgrades" section (MUI 7→9, eslint
  10 blocked-upstream, cosign-installer v4 signing migration). Bump date.
remyluslosius added a commit that referenced this pull request Jun 16, 2026
- Un-ignore SESSION_LOG.md (.gitignore listed it next to the
  already-tracked BACKLOG.md; both are the session-continuity docs
  CLAUDE.md/BACKLOG reference for provenance) and add it with the
  2026-06-16 handoff: SSH full-matrix + per-host learning (#566),
  packaging fresh-install + auto-upgrade (#564/#569), CI gate speedup
  (#567), settings/cleanup (#561/#562/#563/#568), and the Dependabot
  triage (9 merged / 6 skipped), plus next-steps + gotchas.
- BACKLOG: drop the completed PKG-1/PKG-2 (shipped in #564); add the SSH
  learning follow-up (wire connprofile into discovery/intelligence/
  liveness) and a "Deferred Dependency Upgrades" section (MUI 7→9, eslint
  10 blocked-upstream, cosign-installer v4 signing migration). Bump date.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant